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ABSTRACT 

Hardness ratios are commonly used in X-ray photometry to indicate spectral 
properties roughly It is usually defined as the ratio of counts in two different 
wavebands. This definition, however, is problematic when the counts are very 
limited. Here we instead define hardness ratio using the A parameter of Poisson 
processes, and develop an estimation method via Bayesian statistics. Our Monte 
Carlo simulations show the validity of our method. Based on this new definition, 
we can estimate the hydrogen column density for the photoelectric absorption of 
X-ray spectra in the case of low counting statistics. 

Subject headings: methods: statistical — X-rays: ISM — X-rays: general 

1. Introduction 

In recent years, high angular resolution X-ray telescopes make it possible to detect 
X-ray sources with only a few counts. This is very different from the optical photometry. 
Because of these low counts, the Poisson processes in corresponding wavebands cannot be 
approximated to Gaussian distribution. Therefore the statistics will be very different in 
some estimations and calculations than used before. In recent years Bayesian method has 
gained many applications (e.g. van Dyk et al. 2001 and references therein) since it has more 
advantages in low count cases than traditional statistics. 

Hardness ratios are widely used in high energy astrophysics since faint sources with only 
limited counts cannot give any satisfying spectral modeling. In X-ray detection, hardness 
ratios are normally used to show spectral properties roughly (e.g. Tennant et al., 2001; 
Sivakoff, Sarazin & Carlin 2004). Hardness ratios are usually defined as the ratio of counts 
in different wavebands (HR = b/a) or the ratio of the difference and sum of counts in two 
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wavebands (HR = (b — a)/(b + a)), a and b are counts in two wavebands A and B. On 
the other hand, for the spectra of X-ray sources, the photoelectric absorption, quantified 
with the hydrogen column density Njj, cannot be neglected. Hydrogen column density 
contains many kinds of important information, such as the radial distance and the interstellar 
circumstance of the sources. For low count sources, hydrogen column density is hard to know 
since no reliable spectral fitting can be made. However interstellar absorption is energy 
dependent. Consequently the information of hydrogen column density can be drawn from 
the hardness ratios. In this paper, we first give a new definition of hardness ratio and its 
estimation method, and then we discuss the procedure to estimate the hydrogen column 
density accordingly. 



2. The distribution of A parameter under certain counts 

We begin our discussion with the following problem: Suppose that one experiment 
obtained two counts from two different Poisson distributions, we need to: (1) Estimate 
the ratio of the expectation values of the two Poisson distributions, and (2) Construct the 
confidence interval of the ratio. 

The expectation values of the Poisson processes are just the A parameters of the Poisson 
distribution P(n\\) = ^-e _A . Therefore the above problem may be formulated as follows: 
Suppose a and b are two counts corresponding to two different Poisson processes A and B 
with their parameters as A^ and A# respectively, \ a /\b and its confidence interval needs to 
be estimated. To solve this problem we first need to derive the distribution of A parameter 
under certain counts, i.e., the conditional distribution of \ A and As, as follows. 



P{n A = a\\ A = x)p(\ A = x) 

p(\ A = x\n A = a) = -j^— — — — -. (1) 

J P(n A = a\\ A = t)p(\ A = t)dt 

First we assume, as a pragmatic convention, a uniform prior for the A parameter. 

P{n A = a\\ A = x) 
Pu (X A = x\n A = a) = s ~ p{nA = alXA = t)dt 

= (2) 

Similarly, 

y b e -y 

Pu(^b = y\n B = b) = — £j— . (3) 
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Fig. 1. — The probability density function of when 6 = 4 under the two different prior 
assumption. 



This continuous distribution is Gamma distribution, as shown in Fig. 1. 

In addition, we use Jeffreys prior (p(A) = 1/VT), which may be more advantageous 
over the uniform prior commonly used, because the inferences derived from Jeffreys prior 
are parameterization- invariant (See Kass & Wasserman 1996 for detail of this prior). Under 
this prior, the conditional distribution of A is 



pj(\ A = x\n A = a) 



P(n A = q|Aa = x){l/yjx) 
J °°P(n A = a\\ A = t)(l/Vi)dt 

2 a x a-l/2 e -x 

(2a-l)!VF 



(4) 
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and 

2b x b-l/2 e -x 

pj(X B = x\n B = b) = (26 _ 1)!!0F - (5) 

To account for the background contamination, suppose that a is the count corre- 
sponding to a Poisson process A with the addition of a Poisson background process A bkg , 
the expectation value of the process A bkg is assumed to be known as \ Ab . According 
to the properties of Poisson processes, the sum of two Poisson processes is also a Pois- 
son process with the parameter A^ + X Ab . So the probability of the total count n is 
P(n = a\X A = x,\ Ab ) = e -( x +^Ab) m Apply the Bayesian assumption and the uni- 

form prior distribution assumption, we obtain the conditional distribution of \ A , as follows. 



p u (X A = x\n A = a, X Ab ) = 



P(n A = a\\ A = x, \ Ab )p u (\ A = x) 
J °° P{n A = a \\ A = t, \ Ab )p u (\ A = t)dt 

P(n A = a\\ A = x, \ Ab ) 
/ °° P{n A — a\\ A — t, \ Ab )dt 
(x + \ Ab ) a e~ x 

a k 
k=0 



(6) 



When \ Ab is much smaller than a, this result is same as equation(2). Also we can obtain the 
conditional distribution under the Jeffreys prior distribution assumption, 

, , , (i- + A. 4t )°e-'(l/V5) ... 

M^ = X \n A = a,X M ) = r(t + A4t)ae _, (lM)dt . (') 

and it is same as equation(4) when \ Ab is much smaller than a. 



3. Estimate the Hardness Ratio 

There are two different definitions of hardness ratio, R = \ a /\b and HR = (X B — 
Aa/)/(A^ + Aa)- In traditional method, the estimate of R and HR are a/b and (b — a)/(b + a) 
respectively, and the errors are propagated under the Gaussian distribution, i.e., 



° R = bV^ + b^ 



(b + a)< 



OHR = 77—. 77, • (o) 
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Here we propose a method to estimate the hardness ratio based on the Bayesian method. 
Both the uniform prior and the Jeffreys prior will be used. When X^b is much smaller than 
a, we use equation(2) (under the uniform prior) or equation(4) (under the Jeffreys prior) to 
estimate R = Xa/Xb and HR = (A# — Xa)/(Xb + Xa)- 

First we assume the uniform prior of A. For the conditional distribution function of 

Xa/ Xb, 

\ poo \ 

P u (^<z) = / < z\X B = y) Pu (X B = y)dy 

*B JO *b 

poo pzy a p -x n,b p -y 



For the conditional probability density function of Xa/Xb, 

d f°° , f zy x a e~ x , ^y b e~ y 



poo Q pzy a -x v b p -y 

(zy) a e- zy y b e~ y , 
a\ b\ 
z a (a + b+l)\ 



{z + l) a + b + 2 a\b\' 



(10) 



This distribution is shown in Fig. 2 when a = 4, b = 3. It is easy to verify that the 
distribution is normalized, 



L 



l (z + iy+^a\b\ Z 1 j 

For the hardness ratio HR, the probability distribution of this hardness ratio is given 



by, 



Pui ^ , \ A <z)= [ P( Xb Xa < z\X A = x)p u (X A = x)dx 

*B + *A Jo *B + *a 

coo 



f°° 1 + Z 

= / P(X B < X A \X A = x)p u (X A = x)dx 

Jo 1 - z 

, dy) dx . (12) 



o jo 



b\ 
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The conditional probability density function is given by, 
,Xb — Xa s d X B — X A 

d r° , [t^ x v b e-y 



r°° d f^ x v b e~ y x a e~ x 

-L 



6! (1 - a! 

(l-^) a (l + ^) & (a + 6+1)! 

2(a+6+l) a !^| 



(13) 



This distribution is shown in Fig. 3 when a = 4, 6 = 3. It is easy to verify that the 
distribution is normalized, 



"°° ( l-zni + zT(a + 6 + 1)! _ 



Similarly, we obtain the conditional probability density function of R and HR under 
the Jeffreys prior assumption as follows, 

" (z + l)«+ b + 1 (2a-l)!!(26-l)!! 7 r' 1 J 

and 

^ HR ~ Z ) - (2a-l)!!(26-l)!!7r " (16) 

Using the result of p(R = z) and p(HR = z), we can estimate R and ifi?. 

Under the uniform prior assumption, since we have only one observation, we take the 
most probable value as the estimate of R = \a/Xb, denoted as (zr) u . Let 

t^Pu(j^ = z\n A = a,n B = b) = 0, (17) 

we obtain, 

<*«). = (18) 
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Traditional method 
Bayesian method ( uniform prior ' 
Bayesian method ( Jeffreys prior ) 




Fig. 2. — The probability density function of 
Aa/A b when a = 4, b = 3. The solid line rep- 
resents the distribution derived by our pro- 
posed method under the uniform prior, the 
doted line represents the distribution derived 
by our proposed method under the Jeffreys 
prior, the dashed line represents the Gaus- 
sian distribution derived by the traditional 
method. 
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Fig. 3. — The probability density function of 
(A s - A A )/(A B + A A ) when a = 4, b = 3.The 
solid line represents the distribution derived 
by our proposed method under the uniform 
prior, the doted line represents the distribu- 
tion derived by our proposed method under 
the Jeffreys prior, the dashed line represents 
the Gaussian distribution derived by the tra- 
ditional method. 



Similarly, we obtain the most probable value as the estimate of HR: 

b — a 

\Zhr)u 



b + a 



(19) 



Similarly, we get the most probable value of R and HR under the Jeffreys prior assump- 



tion, 



and 



(zhr)j = 



a -1/2 
6 + 3/2' 



b — a 
b + a-1' 



(20) 



(21) 



The highest posterior density (HPD) interval is used to give the error bars. The HPD 
interval under the confidence level a is the range of values which contain a fraction a of 
the probability, and the probability density within this interval is always higher than that 
outside the interval. 
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There are other point estimates and error estimates. For example, the mean value and 
the equal tailed interval. Since the distributions of R and HR are obtained, these alternative 
estimates can be easily derived. 

When \Ab cannot be ignored, equation (6) or equation (7) can be used to estimate R 
and HR. In this situation, it is difficult to give a simple analytic distribution function like 
equation (10) or equation (13); in this case, one can only use numerical integration to obtain 
the distribution of R and HR, and then do the estimate. 

4. Frequency Properties of Intervals 

We use the Monte Carlo simulations to investigate the statistical properties of our result, 
and compare them with the traditional methods. 

First we set A^ and A#. Do Poisson sampling for N times and each time we get a and 
b respectively. Each time we estimate R and HR using two kinds of methods. Finally we 
obtain that, for two methods, the mean square error of the point estimate, the coverage rate 
(the percentage of times during which the confidence interval contains the real value), and 
the mean confidence interval. 

The simulations contain two cases: low counts and high counts. In case 1, we first set 
Aa = 3 and \b = 3, then set \a = 4 and As = 3. In case 2, we first set \a = 20 and 
Xb = 20, then set A a = 20 and As = 15. The confidence level in the simulations is 90%. 
The results of the simulations are shown in table 1. 
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Table 1: Statistical Properties of Our Method and Traditional Method. 







TT 1 Cfh pmiTltQ 

-1-llfcill L/VJLL111jO 


1-1 4-3 


1-1 4-3 


mean square error 


our method R 


^291 D 6872 


3066 4397 


x. 1 * x. ' 1 x. 1 1 T~) 

traditional method R 


1 2399 1 ^398 


3628 ^767 


coverage rate 


our method n 


Q4 69% SQ 1 3% 


Q3 nn% 9n nn% 


I Llllllwl 111 UI1UI J JTTXi 


Q4 38% Q1 49% 


rz> 99% nn% 


our method R 


92 27% 92 3^% 


91 39% 87 87% 
yi.ou/o o 1 .oi /o 


(Jeffreys prior) 


82 82% 87 ^2% 


87 ^9% 84 62% 


traditional R 


84.70% 84.82% 


90.29% 89.46% 


method jjr 


88.22% 85.34% 


89.10% 88.26% 


mean 
confidence interval 


our method R 


3.40 4.67 


1.20 1.60 


(uniform prior) HR 


1.06 1.00 


0.50 0.53 


our method R 


4.05 5.56 


1.17 1.80 


(Jeffreys prior) HR 


1.13 1.05 


0.50 0.56 


traditional R 


4.91 5.90 


1.17 1.78 


method HR 


1.43 1.23 


0.52 0.55 



From the simulation results, we notice that our proposed method is more reliable than 
the traditional method when the counts are low. The reason is that the traditional method 
is based on using the Gaussian distribution to approach the Poisson distribution, which is 
not reliable when the counts are low. 



5. Application to N H Estimation 

Here we propose a method to estimate the hydrogen column density using data obtained 
with the Chandra X-ray observatory. Because of the high angular resolution of Chandra, a 
positive detection of a point source during a survey observation only requires several counts. 
Therefore our method will be more reliable when estimating the hardness ratio than the 
traditional method. The detail procedure of this application can be found in another paper 
(Wu et al. 2006). The basic idea is introduced as follows. 

The basic procedure consists of the following three steps: (1) calculate the relationship 
between the hardness ratios and N H values under certain spectral model; (2) estimate hard- 
ness ratios according to observed counts in different wavebands; and (3) interpolate the Nh 



-10- 




Fig. 4. — Relation between Nh and HR' & HR2'. The solid line with squares is the relation 
of Nh and HR', while the dotted line with triangles is the one of Nh and HR2' (Wu et al. 
2006) 

values and error intervals from hardness ratios. 

According to the most likely physical nature of the sources, we can assume a spectral 
model (e.g. power law with photon index T = 2 for typical X-ray binaries). Then we can 
use PIMMS (http://cxc.harvard.edu/toolkit/pimms.jsp) tool to calculate the relationship 
between hydrogen column density and the hardness ratio. Using PIMMS, one can get the 
count rate in certain energy band under a given X-ray spectrum and hydrogen column 
density. The count rate in the given energy band is just the A parameter in this energy 
band; this was our original movivation of defining the hardness ratios in terms of the A 
parameter. The calculated Nh (hydrogen column density) — HR relationships are shown in 
Fig. 7 for three different energy bands of A (1 - 3 keV), B (3 - 5 keV) and C (5 - 8 keV) for 
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a Chandra ACIS-I observation (Wu et al. 2006) respectively: HR' = ^f^-, HR2' = ^§^f ■ 
From Fig. 7, we can see that HR' or HR2' is more appropriate for Nh < 2 x 10 23 cm~ 2 or 
Nh > 2 x 10 23 cm -2 , respectively. Having the value and error interval of HR, we can finally 
do linear interpolation on curves in Fig. 7 to obtain the value and error interval of hydrogen 
column density. 



6. Summary & Discussion 

First we give the conditional probability distribution of A parameter under certain counts 
in a Poisson process using Bayesian statistics. According to this result we derive the proba- 
bility density function of two kinds of hardness ratios. We take the most probable values as 
the estimate of hardness ratios and the HPD intervals as the error intervals. Then we use 
Monte Carlo simulations to investigate the statistical properties of our results, and find that 
our method is more reliable than the traditional method when the counts are low. Finally 
we show how to estimate the hydrogen column density using hardness ratios. 

Our method developed in this paper provides a way to estimate the hydrogen column 
density of sources which are too faint to do spectral fitting. However the spectral shape for 
these sources must be assumed a prior. This method is especially convenient for a sample 
of faint sources with similar spectra. 

After this paper has been submitted initially on 06-03-29, we noticed another submitted 
paper (Park et al. 2006) which discusses the same statistical problem as we have done in this 
paper. In that paper the authors also used the Bayesian method to estimate the hardness 
ratio, and showed some applications on quiescent Low-Mass X-ray Binaries, the evolution of 
a flare, etc, therefore justifying the wide range of applications of such a statistical problem. 
Since the strict analytic solution of the hardness ratio distribution does not exist for general 
situations, the authors suggested methods by Monte Carlo and numerical integration to 
obtain the distribution in that paper. In our paper we find simple analytic solutions of the 
probability density functions of hardness ratios for the situations in which the background 
can be ignored. This will be useful and convenient for some applications, such as Chandra 
data in which background can be ignored for hardness ratio estimation of point sources. 

Finally, we note, under the advise of the referee, that in 1980s some studies have been 
done on the ratio of Poisson means both from a frequentist standpoint (James & Roos 
1980) and from a Bayesian standpoint (Helene 1984 and Prosper 1985). In this paper we 
used the Bayesian method under the uniform prior and the Jeffreys prior, made extensive 
comparisons between this method and traditional method, aiming explicitly at applications 
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in astrophysics. 

Xie Chen read the first draft carefully and gave many helpful suggestions, especially 
on English writing. We are particularly grateful to the referee for his pointing out relevant 
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